Counting objects in digital images is a process that should be replaced bymachines. This tedious task is time consuming and prone to errors due tofatigue of human annotators. The goal is to have a system that takes as inputan image and returns a count of the objects inside and justification for theprediction in the form of object localization. We repose a problem, originallyposed by Lempitsky and Zisserman, to instead predict a count map which containsredundant counts based on the receptive field of a smaller regression network.The regression network predicts a count of the objects that exist inside thisframe. By processing the image in a fully convolutional way each pixel is goingto be accounted for some number of times, the number of windows which includeit, which is the size of each window, (i.e., 32x32 = 1024). To recover the truecount we take the average over the redundant predictions. Our contribution isredundant counting instead of predicting a density map in order to average overerrors. We also propose a novel deep neural network architecture adapted fromthe Inception family of networks called the Count-ception network. Together ourapproach results in a 20% relative improvement (2.9 to 2.3 MAE) over the stateof the art method by Xie, Noble, and Zisserman in 2016.
展开▼